-
Notifications
You must be signed in to change notification settings - Fork 29
feat(heuristics): add Fake Email analyzer to validate maintainer email domain #1106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
c6f35f7
to
8d29103
Compare
f945882
to
a7103e4
Compare
src/macaron/malware_analyzer/pypi_heuristics/metadata/fake_email.py
Outdated
Show resolved
Hide resolved
…l domains Signed-off-by: Amine <amine.raouane@enim.ac.ma>
Signed-off-by: Amine <amine.raouane@enim.ac.ma>
Signed-off-by: Amine <amine.raouane@enim.ac.ma>
759ab97
to
d99495c
Compare
…mail domain validation Signed-off-by: Amine <amine.raouane@enim.ac.ma>
d99495c
to
a8d373b
Compare
""" | ||
emailinfo = None | ||
try: | ||
emailinfo = validate_email(email, check_deliverability=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please make check_deliverability
configurable via defaults.ini
so that you can turn it off in unit tests? We try to avoid network calls in our unit tests.
analyzer: FakeEmailAnalyzer, pypi_package_json_asset_mock: MagicMock, mock_validate_email: MagicMock | ||
) -> None: | ||
"""Test analyzer passes when only maintainer_email is present and valid.""" | ||
email = "maintainer@example.net" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is example.net
supposed to be a valid domain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. The domain is technically valid, but it’s reserved by IANA and not intended for real-world use. So to ensure the email is actually usable by a real user, Should I add a list of reserved domains and TLDs and check against them before proceeding with validation or acceptance, or should I keep it as it is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the email validator packages has a test environment flag?
test_environment=False
: IfTrue
, DNS-based deliverability checks are disabled andtest
and**.test
domain names are permitted (see below). You can also setemail_validator.TEST_ENVIRONMENT
toTrue
to turn it on for all calls by default.
Would that maybe be better in a unit test? If a user does use those IANA reserved domains shouldn't it fail if we use check_deliverability
as they aren't set up to receive emails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but I tried to test both cases when deliverability
is enabled and when it's not. Setting test_environment=False
actually makes .test
and similar test domains valid, so it's similar to the case when check_deliverability=False
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. check_deliverability=True
will mean it'll make a network connection right? This could cause some issues with running the unit tests offline. Would it be possible to either mock a network response, or otherwise put this in an integration test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can just switch it here and it will follow the other path, so it won’t trigger the network connection.
validated_emails = info.get("validated_emails") | ||
assert isinstance(validated_emails, list) | ||
assert len(validated_emails) == 2 | ||
assert {"normalized": "author@example.com", "local_part": "author", "domain": "example.com"} in validated_emails |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same question, if example.com
is supposed to be a valid domain.
src/macaron/malware_analyzer/pypi_heuristics/metadata/fake_email.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Amine <amine.raouane@enim.ac.ma>
95aa0cc
to
468d67e
Compare
"info": {"author_email": author_email, "maintainer_email": maintainer_email} | ||
} | ||
result, info = analyzer.analyze(pypi_package_json_asset_mock) | ||
if analyzer.check_deliverability: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What controls analyzer.check_deliverability
here? It defaults to True
right? So won't this branch always be run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it defaults to True
, but I tried to handle both cases depending on what the user sets. So if the user explicitly sets check_deliverability=False
, it will follow the other path and skip that branch.
Summary
This PR adds a new heuristic analyzer called
FakeEmailAnalyzer
. It verifies the validity of maintainer email addresses listed in a PyPI package by checking both the format and the existence of MX records for their domains. This helps detect packages with fake or throwaway emails, which are often indicative of malicious intent.Description of changes
FakeEmailAnalyzer
that:detect_malicious_metadata_check.py
to include and invoke this new analyzer.Related issues
None
Checklist
verified
label should appear next to all of your commits on GitHub.